edgeworth expansion
Optimal Non-Asymptotic Edgeworth Expansions for Multivariate Neural Network Outputs
Finite-width fully connected neural networks with Gaussian-initialized weights deviate from their infinite-width Gaussian limit, exhibiting non-vanishing higher-order cumulants. We approximate these deviations, for a neural network evaluated in a finite number of inputs, using multidimensional Edgeworth expansions of arbitrary order $4m-1$, with $m\in\mathbb{N}$. Assuming that the corresponding Gaussian limit has an invertible covariance matrix and that the activation function is polynomially bounded, we establish a bound of order $n^{-m}$ on the total variation distance between the law of the true network output and its Edgeworth approximation, with matching lower bounds. As an application, we quantify the error in Bayesian posterior distributions when the prior is replaced by its Edgeworth expansion. Our results are more general and also apply to sequences of conditionally Gaussian vectors converging to a Gaussian vector with invertible covariance.
End-Cut Preference in Survival Trees
The end-cut preference (ECP) problem, referring to the tendency to favor split points near the boundaries of a feature's range, is a well-known issue in CART (Breiman et al., 1984). ECP may induce highly imbalanced and biased splits, obscure weak signals, and lead to tree structures that are both unstable and difficult to interpret. For survival trees, we show that ECP also arises when using greedy search to select the optimal cutoff point by maximizing the log-rank test statistic. To address this issue, we propose a smooth sigmoid surrogate (SSS) approach, in which the hard-threshold indicator function is replaced by a smooth sigmoid function. We further demonstrate, both theoretically and through numerical illustrations, that SSS provides an effective remedy for mitigating or avoiding ECP.
Higher-Order Asymptotics of Test-Time Adaptation for Batch Normalization Statistics
This study develops a higher-order asymptotic framework for test-time adaptation (TTA) of Batch Normalization (BN) statistics under distribution shift by integrating classical Edgeworth expansion and saddlepoint approximation techniques with a novel one-step M-estimation perspective. By analyzing the statistical discrepancy between training and test distributions, we derive an Edgeworth expansion for the normalized difference in BN means and obtain an optimal weighting parameter that minimizes the mean-squared error of the adapted statistic. Reinterpreting BN TTA as a one-step M-estimator allows us to derive higher-order local asymptotic normality results, which incorporate skewness and other higher moments into the estimator's behavior. Moreover, we quantify the trade-offs among bias, variance, and skewness in the adaptation process and establish a corresponding generalization bound on the model risk. The refined saddlepoint approximations further deliver uniformly accurate density and tail probability estimates for the BN TTA statistic. These theoretical insights provide a comprehensive understanding of how higher-order corrections and robust one-step updating can enhance the reliability and performance of BN layers in adapting to changing data distributions.
CLT and Edgeworth Expansion for m-out-of-n Bootstrap Estimators of The Studentized Median
Banerjee, Imon, Chakrabarty, Sayak
The m-out-of-n bootstrap, originally proposed by Bickel, Gotze, and Zwet (1992), approximates the distribution of a statistic by repeatedly drawing m subsamples (with m much smaller than n) without replacement from an original sample of size n. It is now routinely used for robust inference with heavy-tailed data, bandwidth selection, and other large-sample applications. Despite its broad applicability across econometrics, biostatistics, and machine learning, rigorous parameter-free guarantees for the soundness of the m-out-of-n bootstrap when estimating sample quantiles have remained elusive. This paper establishes such guarantees by analyzing the estimator of sample quantiles obtained from m-out-of-n resampling of a dataset of size n. We first prove a central limit theorem for a fully data-driven version of the estimator that holds under a mild moment condition and involves no unknown nuisance parameters. We then show that the moment assumption is essentially tight by constructing a counter-example in which the CLT fails. Strengthening the assumptions slightly, we derive an Edgeworth expansion that provides exact convergence rates and, as a corollary, a Berry Esseen bound on the bootstrap approximation error. Finally, we illustrate the scope of our results by deriving parameter-free asymptotic distributions for practical statistics, including the quantiles for random walk Metropolis-Hastings and the rewards of ergodic Markov decision processes, thereby demonstrating the usefulness of our theory in modern estimation and learning tasks.
On the number of modes of Gaussian kernel density estimators
Geshkovski, Borjan, Rigollet, Philippe, Sun, Yihang
We consider the Gaussian kernel density estimator with bandwidth $\beta^{-\frac12}$ of $n$ iid Gaussian samples. Using the Kac-Rice formula and an Edgeworth expansion, we prove that the expected number of modes on the real line scales as $\Theta(\sqrt{\beta\log\beta})$ as $\beta,n\to\infty$ provided $n^c\lesssim \beta\lesssim n^{2-c}$ for some constant $c>0$. An impetus behind this investigation is to determine the number of clusters to which Transformers are drawn in a metastable state.
Hedging and Approximate Truthfulness in Traditional Forecasting Competitions
Monroe, Mary, Thilagar, Anish, Hsu, Melody, Frongillo, Rafael
In forecasting competitions, the traditional mechanism scores the predictions of each contestant against the outcome of each event, and the contestant with the highest total score wins. While it is well-known that this traditional mechanism can suffer from incentive issues, it is folklore that contestants will still be roughly truthful as the number of events grows. Yet thus far the literature lacks a formal analysis of this traditional mechanism. This paper gives the first such analysis. We first demonstrate that the ''long-run truthfulness'' folklore is false: even for arbitrary numbers of events, the best forecaster can have an incentive to hedge, reporting more moderate beliefs to increase their win probability. On the positive side, however, we show that two contestants will be approximately truthful when they have sufficient uncertainty over the relative quality of their opponent and the outcomes of the events, a case which may arise in practice.